Rank | Count | Beginning |
---|---|---|
13555 | 5699 | La |
4540 | 2644 | En |
8265 | 1189 | Ĝi |
19675 | 1125 | Li |
24027 | 487 | Post |
3070 | 429 | Dum |
10579 | 405 | Ili |
22747 | 379 | Oni |
28505 | 366 | Tiu |
10103 | 309 | Historio |
3674 | 280 | Ekde |
18856 | 252 | Laŭ |
4246 | 230 | El |
25826 | 230 | Sed |
26386 | 227 | Ŝi |
28239 | 216 | Tio |
24929 | 211 | Pro |
19661 | 206 | Lia |
8222 | 193 | Ĝia |
2551 | 181 | De |
725 | 179 | Ankaŭ |
23844 | 171 | Por |
11127 | 160 | Inter |
27301 | 160 | Tamen |
12240 | 157 | Kiam |
2069 | 150 | CI |
24086 | 140 | Poste |
28670 | 140 | Tiuj |
12402 | 137 | Kiel |
25815 | 132 | Se |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV